home *** CD-ROM | disk | FTP | other *** search
- spanel(1)
-
- Name
- spanel - control panel for speech recognition
-
- Synopsis
- spanel [ -display displayName ] [ -application displayName ]
- [ -vocabulary vocabularyFileName ] [ -sound soundFileName ]
- [ -recThr # ] [ -program progName ] [ -background ] [ -output ]
- [ -topLevel ] [ -noAGC ] [ -noSave ]
-
- Description
- spanel comprises components of a speech recognition system.
-
- spanel has a graphical interfaces that allows the user to manage
- a set of known speech templates (a vocabulary). Templates can be added,
- deleted, modified, trained, associated with an action, saved or loaded
- as a set in a vocabulary file.
-
- spanel contains algorithms which frame isolated speech utterances
- (or sounds) and trains or matches these tokens against known
- templates in the current vocabulary.
-
- spanel can synthesize keystroke and mouse button events in response to
- a recognition. These synthesized events are sent to the window
- with pointer focus via the X protocol.
-
- Options
- -display displayName
- designates the X server screen on which to display spanel's GUI.
-
- -applicationDisplay displayName
- designates the X server screen to which spanel will send
- an action event to in response to a recognition.
-
- -vocabulary vocabFileName
- specifies a file (in spanel's vocabulary file format - probably originally
- created by spanel) which contains the desired templates to manipulate or
- match. The default filename suffix for vocabulary files is ".voc".
-
- -sound soundFileName
- the name of an aifc file to play in response to a recognition.
- By default, soundFileName is '/usr/lib/sounds/speechAck.aiff'.
- To disable this possibly annoying feature, specify '... -sound "" ...'.
-
- -recThr #
- specify the maximum difference (lowest score) between an unknown token
- and the best-matched template which will still qualify a recognition.
- The default is 400.
-
- -program programName
- spanel starts the specified program (forks and execs a child process).
- Spanel will terminate with the child.
-
- -background
- starts the spanel process but without displaying the GUI. Automatically
- starts spanel in action mode.
-
- -output
- spanel writes the action string associated with a template (or the template
- name if no action is specified) to stdout upon successful recognition.
-
- -topLevel
- tells spanel to send action events to the top level window (child of root)
- containing the window with pointer focus rather than sending the event
- directly to the window with pointer focus.
-
- -noAGC
- tells spanel not to use experimental automatic gain control algorithms.
- These algorithms attempt to find an optimum audio input level for the
- given signal and noise in the environment. Usually, the user can manually
- adjust these levels through experimentation using apanel's meters with
- better results than relying on AGC.
-
- -noSave
- tells spanel not to save templates to a central repository (currently
- lance.esd.sgi.com). This repository will enable users to quickly build
- custom robust speaker-independent vocabularies drawing on a database
- of previously trained words.
-
- GUI
- Through the menu bar the user may bring up a file submenu or a utility
- submenu. The file submenu has options for loading, saving, newing, and
- quitting. The utilities submenu has options for bringing up apanel and
- clearing previous template training.
-
- The top window is a scrollable status window which simply lists a few lines
- of textual comments like "saved vocabulary test.voc" and "added template
- zulu".
-
- Below the status window is a one line prompt window indicating what spanel
- expects the user to do next (like training prompts: "say the word 'zero'").
-
- On the right below the prompt window is a multiline scrollable vocabulary
- window listing all the templates in the currently loaded vocabulary.
- The templates can be selected for various operations.
-
- On the left below the prompt window are three check boxes labeled action,
- score, and train. Selecting any one of them causes spanel to listen.
- When the action mode is selected, spanel will send events to the pointer
- window upon successful recognition. When the score mode is selected, spanel
- will display the score (distance from framed speech token) of each template
- in the vocabulary window and sort the templates by score (best/lowest first).
- When the train mode is selected, spanel uses the prompt window to
- request the utterance of a vocabulary word then trains the
- corresponding template with the next frame of speech from the user.
- Technically these modes are not mutually exclusive but in practice
- are seldomly used concurrently.
-
- The "correct" button is below the three check boxes and allows the user to
- perform training beyond what can be done in train mode. The "correct" button
- is active only in score mode and should be pushed when spanel makes an
- incorrect recognition and the user has selected the correct word from the
- vocabulary window. The result of correcting spanel is a train (with the
- already spoken utterance) of the correct (selected) word and an untrain of
- the incorrect word (the one spanel scored best). This will reduce the
- chance of spanel making the same mistake in the future. Note that correcting
- will state an untrain error (only informative - not harmful) if the word has
- not been trained with enough passes (30) for untraining to have pleasant
- effects.
-
- The delete button deletes the selected templates.
-
- The button below the delete button is used for modifying or adding a template
- name (depending on whether a template is selected or not, respectively).
- Pushing this button modifies or adds a template using the text in the field
- adjacent to this button on the right. Depressing the return key while the
- corresponding text field is active has the same effect as pushing the button.
- As a convention, template names should use lower-case and underscores
- between words.
-
- The button below the template button is used for specifying actions.
- Pushing this button (or depressing the return key while in the corresponding
- text field to the right) associates the action with the selected template.
- Actions are represented by text strings and are converted into X events.
- Each character of plain text is converted into at least one
- XKeyPress/XKeyRelease event pair (depending on whether or not the X event's
- KeyCode for the specified KeySym needs a shifted KeyCode).
- Symbolic names are enclosed in angle brackets (like <Escape>) and follow
- the X conventions found in /usr/include/X11/keysymdef.h. For instance
- the Control-D key commonly used for EOF to a shell is represented as an
- action with the string "<Control>d". Also "<Return>" will be used often at
- the ends of some action strings. Modifier keys, such as the "<Control>" in
- "<Control>d" are released after one plain (non-symbolic) character (the "d").
- The backslash character can escape special characters such as the opening
- angle bracket '<' and itself '\'.
-
- spanel's graphical components will resize somewhat to the window.
-
- Capabilities
- The accuracy of spanel will vary greatly (from near-perfect to unusable)
- depending on the audio input. The biggest factor will be the microphone
- type and its placement. The Indigo microphone can work but has the
- disadvantage of picking up noise from all directions and of not having a
- fixed location. A uni-directional noise-cancelling headset microphone
- should be used to attain highest accuracy. However, the Indigo microphone
- can be held about four inches off the side of the mouth (to avoid wind
- noise) for an acceptable signal to noise ratio, or if the user is blessed
- with an unusually quiet work environment the Indigo microphone can be
- positioned on the desktop. The user should experiment to get best results.
- Generally, positioning the mic closer to the mouth will provide higher signal
- levels, while the keyboard and monitor may cause interference.
- The user should observe signal and noise levels using apanel's meters for
- estimating the performance of various microphone positions and for
- setting the gain.
-
- An experimental automatic gain control is implemented for situations where
- the signal and noise levels are not known ahead of time and can not be
- adjusted manually. Most users can adjust these levels better and should
- do so by invoking spanel with -noAGC and raising the sliders on apanel
- until spanel starts responding to normal environmental noise (usually just
- shy of full gain). If errors from spanel occur that indicate overflowing
- audio buffers (a maximum of four seconds of speech is allowed per utterance),
- the audio level is set too high and the algorithms are being incorrectly
- triggered by noise.
-
- The algorithms are currently capable of framing isolated speech.
- This mean each utterance or sound must be preceded and followed by
- some amount of silence (a few tenths of a second).
-
- Although the algorithms are speaker independent, the vocabulary
- is speaker dependent until it has been trained with various samples of
- speakers from the target audience. The algorithms will start working
- after four training passes and will continue to become more robust with
- hundreds of speakers making several training passes each.
-
- Limitations
- Training with spanel is not an optimal way to develop a vocabulary.
- An application-oriented training scenario would more accurately capture words
- as the speaker really says them in the course of using the application.
- This will require a training API or toolkit interface to the algorithms and
- cooperation from the application engineers.
-
- Bugs
- Changing input source sampling rate from anything but 8KHz can lead to
- unpredictable results.
-
- The source must be set correctly (usually the microphone) for spanel to
- operate correctly.
-
- Currently only selection of one template at a time is possible.
-
- Spanel always keeps one audio port open for input.
-
- When not running in background mode, spanel takes way too much of the
- CPU time due to improper handling of both audio and X input.
-
- After deleting templates, spanel gets the UI order somewhat mixed up.
- In order to prevent spanel from operating on the wrong template
- after a template selection in the UI, the vocabulary should be reloaded.
- This is a bad bug and will be fixed ASAP.
-
- Requirements
- Currently spanel only works on systems with support for the audio library.
- This is currently only the 4D30, the 4D35 and the Iris Indigo.
-
- Spanel uses one audio port for input and possibly one other if output sound
- acknowledgment is enabled.
-
- While listening in background mode, spanel requires approximately 5% of the
- CPU power, with a short burst when a framed token is matched against the
- templates in the currently loaded vocabulary.
-
- Also Related
- see apanel(1).
-
- In the future, an API with documentation will be available for
- programmatic interfacing to the recognition algorithms.
-
-